covid 19 Data Visualization

Manuela da Cruz Chadreque

10 Dezember 2020



Setup and Sourcing

Importing modules

In [141]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt 
print('modules are imported')
modules are imported

Loading the Dataset

In [142]:
url='https://datahub.io/core/covid-19/r/countries-aggregated.csv'
df=pd.read_csv(url)

let's check the dataframe

In [143]:
df.head()
Out[143]:
Date Country Confirmed Recovered Deaths
0 2020-01-22 Afghanistan 0 0 0
1 2020-01-23 Afghanistan 0 0 0
2 2020-01-24 Afghanistan 0 0 0
3 2020-01-25 Afghanistan 0 0 0
4 2020-01-26 Afghanistan 0 0 0
In [144]:
df.tail()
Out[144]:
Date Country Confirmed Recovered Deaths
69519 2021-01-15 Zimbabwe 26109 15414 666
69520 2021-01-16 Zimbabwe 26881 15872 683
69521 2021-01-17 Zimbabwe 27203 16512 713
69522 2021-01-18 Zimbabwe 27892 17372 773
69523 2021-01-19 Zimbabwe 28675 18110 825

let's check the shape of the dataframe

In [145]:
df.shape
Out[145]:
(69524, 5)

let's do some preprocessing

In [146]:
df=df[df.Confirmed>0] #Data related to each country from the first cofirmed cases
In [147]:
df.head()
Out[147]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0 0
34 2020-02-25 Afghanistan 1 0 0
35 2020-02-26 Afghanistan 1 0 0
36 2020-02-27 Afghanistan 1 0 0
37 2020-02-28 Afghanistan 1 0 0
In [148]:
df[df.Country=='South Africa']
Out[148]:
Date Country Confirmed Recovered Deaths
57919 2020-03-05 South Africa 1 0 0
57920 2020-03-06 South Africa 1 0 0
57921 2020-03-07 South Africa 1 0 0
57922 2020-03-08 South Africa 3 0 0
57923 2020-03-09 South Africa 3 0 0
... ... ... ... ... ...
58235 2021-01-15 South Africa 1311686 1062690 36467
58236 2021-01-16 South Africa 1325659 1083978 36851
58237 2021-01-17 South Africa 1337926 1098441 37105
58238 2021-01-18 South Africa 1346936 1117452 37449
58239 2021-01-19 South Africa 1356716 1144857 38288

321 rows × 5 columns

Global spread of Covid19

In [185]:
fig=px.choropleth(df, locations='Country',locationmode='country names', color='Confirmed', animation_frame='Date')
fig.update_layout(title_text='Global Spread of Covid19')
fig.show()

Global deaths of Covid19

In [150]:
fig1=px.choropleth(df, locations='Country',locationmode='country names', color='Deaths', animation_frame='Date')
fig1.update_layout(title_text='Global Deaths of Covid19')
fig1.show()

How intensive the Covid19 Transmission has been in South Africa

In [151]:
df_sa=df[df.Country=='South Africa']
df_sa.head()
Out[151]:
Date Country Confirmed Recovered Deaths
57919 2020-03-05 South Africa 1 0 0
57920 2020-03-06 South Africa 1 0 0
57921 2020-03-07 South Africa 1 0 0
57922 2020-03-08 South Africa 3 0 0
57923 2020-03-09 South Africa 3 0 0

let's select the columns that we need

In [152]:
df_sa=df_sa[['Date','Confirmed']]
df_sa=df_sa[df.Confirmed>0]
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: UserWarning:

Boolean Series key will be reindexed to match DataFrame index.

In [153]:
df_sa.head()
Out[153]:
Date Confirmed
57919 2020-03-05 1
57920 2020-03-06 1
57921 2020-03-07 1
57922 2020-03-08 3
57923 2020-03-09 3

calculating the first derivation of confrimed column

In [154]:
df_sa['Infection Rate']=df_sa['Confirmed'].diff()
In [155]:
df_sa.head()
Out[155]:
Date Confirmed Infection Rate
57919 2020-03-05 1 NaN
57920 2020-03-06 1 0.0
57921 2020-03-07 1 0.0
57922 2020-03-08 3 2.0
57923 2020-03-09 3 0.0
In [156]:
px.line(df_sa, x='Date', y=['Confirmed','Infection Rate'])
In [157]:
df_sa['Infection Rate'].max() #Describe how massive covid19 was in China
Out[157]:
21980.0

Let's Calculate Maximum infection rate for all of the countries

In [158]:
df.head()
Out[158]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0 0
34 2020-02-25 Afghanistan 1 0 0
35 2020-02-26 Afghanistan 1 0 0
36 2020-02-27 Afghanistan 1 0 0
37 2020-02-28 Afghanistan 1 0 0

let's create a new Dataframe

In [159]:
countries=list(df['Country'].unique())

Let's plot the barchart : maximum infection rate of each country

In [160]:
max_infection_rates=[]
for c in countries:
    MIR=df[df.Country==c].Confirmed.diff().max()
    max_infection_rates.append(MIR)
print(max_infection_rates)
df_MIR=pd.DataFrame()
df_MIR['Country']=countries
df_MIR['Max Infection Rate']=max_infection_rates
df_MIR.head()
px.bar(df_MIR,x='Country',y='Max Infection Rate', color='Country', title='Global Maximum Infection', log_y=True)
[1485.0, 879.0, 1133.0, 299.0, 355.0, 39.0, 18326.0, 2476.0, 716.0, 9586.0, 4451.0, 310.0, 841.0, 4019.0, 161.0, 1975.0, 23921.0, 1382.0, 139.0, 57.0, 2573.0, 1953.0, 1328.0, 87843.0, 26.0, 4828.0, 315.0, 2158.0, 101.0, 159.0, 31.0, 2324.0, 16141.0, 216.0, 91.0, 13990.0, 15136.0, 21078.0, 234.0, 649.0, 476.0, 3115.0, 430.0, 4620.0, 650.0, 907.0, 17773.0, 4508.0, 99.0, 280.0, 17.0, 2370.0, 11536.0, 1774.0, 885.0, 1750.0, 209.0, 1084.0, 276.0, 1829.0, 5.0, 840.0, 106091.0, 570.0, 248.0, 5450.0, 49044.0, 1513.0, 3316.0, 26.0, 4233.0, 278.0, 156.0, 133.0, 332.0, 7.0, 1386.0, 6819.0, 106.0, 97894.0, 14224.0, 14051.0, 5055.0, 8227.0, 9997.0, 40902.0, 244.0, 7863.0, 7933.0, 18757.0, 1554.0, 1237.0, 2220.0, 1073.0, 11505.0, 14.0, 1861.0, 6154.0, 931.0, 97.0, 1639.0, 62.0, 4551.0, 1967.0, 7.0, 614.0, 1232.0, 4029.0, 215.0, 233.0, 245.0, 3.0, 296.0, 41.0, 28115.0, 1766.0, 40.0, 87.0, 874.0, 6195.0, 895.0, 683.0, 5743.0, 13072.0, 89.0, 480.0, 145.0, 1867.0, 1402.0, 1680.0, 2685.0, 12073.0, 5186.0, 73.0, 1268.0, 21358.0, 6725.0, 32733.0, 10947.0, 2355.0, 10269.0, 29499.0, 289.0, 5.0, 43.0, 110.0, 1.0, 76.0, 151.0, 4919.0, 342.0, 7999.0, 78.0, 86.0, 1426.0, 6315.0, 3354.0, 5.0, 288.0, 21980.0, 323.0, 84287.0, 878.0, 2414.0, 139.0, 32485.0, 21926.0, 169.0, 27.0, 211.0, 181.0, 745.0, 10.0, 67.0, 217.0, 5752.0, 823225.0, 298031.0, 1859.0, 16585.0, 3491.0, 68192.0, 1514.0, 981.0, 0.0, 1281.0, 50.0, 2516.0, 116.0, 1796.0, 1365.0]

Let's See how Festive Season Impacts Covid19 transmission in South Africa

COVID19 pandemic lockdown in South Africa

Government has announced tighter level 1 lockdown restrictions nationwide, with effect from midnight tonight. This as the country enters the festive season amid a second wave of rising COVID-19 infections. source

In [161]:
sa_festive_season_start_date = '2020-12-15'
sa_festive_season_end = '2021-01-05'
In [162]:
df.head()
Out[162]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0 0
34 2020-02-25 Afghanistan 1 0 0
35 2020-02-26 Afghanistan 1 0 0
36 2020-02-27 Afghanistan 1 0 0
37 2020-02-28 Afghanistan 1 0 0

let's get data related to italy

In [163]:
df_sa=df[df.Country=='South Africa']

lets check the dataframe

In [164]:
df_sa.head()
Out[164]:
Date Country Confirmed Recovered Deaths
57919 2020-03-05 South Africa 1 0 0
57920 2020-03-06 South Africa 1 0 0
57921 2020-03-07 South Africa 1 0 0
57922 2020-03-08 South Africa 3 0 0
57923 2020-03-09 South Africa 3 0 0

let's calculate the infection rate in South Africa

In [165]:
df_sa['Infection Rate']=df_sa.Confirmed.diff()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

ok! now let's do the visualization

In [166]:
fig=px.line(df_sa, x='Date', y='Infection Rate', title="Festive Season in South Africa")
fig.add_shape(
    dict(
        type='line',
        x0=sa_festive_season_start_date,
        y0=0,
        x1=sa_festive_season_start_date,
        y1=df_sa['Infection Rate'].max(),
        line=dict(color='red', width=2)  
    )
)
fig.add_annotation(
    dict(
        x=sa_festive_season_start_date,
        y=df_sa['Infection Rate'].max(),
        text='starting festive season'
    )
)



fig.add_shape(
    dict(
        type='line',
        x0=sa_festive_season_end,
        y0=0,
        x1=sa_festive_season_end,
        y1=df_sa['Infection Rate'].max(),
        line=dict(color='red', width=2)  
    )
)

fig.add_annotation(
    dict(
        x=sa_festive_season_end,
        y=df_sa['Infection Rate'].min(),
        text='End of festive season'
    )
)

Task 5: Let's See how National Lockdowns Impacts Covid19 active cases in South Africa

In [167]:
df_sa.head()
Out[167]:
Date Country Confirmed Recovered Deaths Infection Rate
57919 2020-03-05 South Africa 1 0 0 NaN
57920 2020-03-06 South Africa 1 0 0 0.0
57921 2020-03-07 South Africa 1 0 0 0.0
57922 2020-03-08 South Africa 3 0 0 2.0
57923 2020-03-09 South Africa 3 0 0 0.0

let's calculate number of active cases day by day

In [168]:
df_sa['Deaths Rate']=df_sa.Deaths.diff()
df_sa.head()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[168]:
Date Country Confirmed Recovered Deaths Infection Rate Deaths Rate
57919 2020-03-05 South Africa 1 0 0 NaN NaN
57920 2020-03-06 South Africa 1 0 0 0.0 0.0
57921 2020-03-07 South Africa 1 0 0 0.0 0.0
57922 2020-03-08 South Africa 3 0 0 2.0 0.0
57923 2020-03-09 South Africa 3 0 0 0.0 0.0
In [169]:
fig=px.line(df_sa,x='Date',y=['Infection Rate', 'Deaths Rate'])

fig.show()
In [170]:
df_sa['Infection Rate']=df_sa['Infection Rate']/df_sa['Infection Rate'].max()
df_sa['Deaths Rate']=df_sa['Deaths Rate']/df_sa['Deaths Rate'].max()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's check the dataframe again

In [171]:
df_sa.head()
Out[171]:
Date Country Confirmed Recovered Deaths Infection Rate Deaths Rate
57919 2020-03-05 South Africa 1 0 0 NaN NaN
57920 2020-03-06 South Africa 1 0 0 0.000000 0.0
57921 2020-03-07 South Africa 1 0 0 0.000000 0.0
57922 2020-03-08 South Africa 3 0 0 0.000091 0.0
57923 2020-03-09 South Africa 3 0 0 0.000000 0.0

now let's plot a line chart to compare COVID19 national lockdowns impacts on spread of the virus and number of active cases

In [172]:
fig=px.line(df_sa,x='Date',y=['Infection Rate', 'Deaths Rate'])
fig.show()
In [173]:
fig=px.line(df_sa, x='Date', y='Deaths Rate', title="Festive Season Deaths in South Africa")
#fig=px.line(df_sa,x='Date',y=['Infection Rate', 'Deaths Rate'])
fig.add_shape(
    dict(
        type='line',
        x0=sa_festive_season_start_date,
        y0=0,
        x1=sa_festive_season_start_date,
        y1=df_sa['Infection Rate'].max(),
        line=dict(color='red', width=2)  
    )
)
fig.add_annotation(
    dict(
        x=sa_festive_season_start_date,
        y=df_sa['Infection Rate'].max(),
        text='starting festive season'
    )
)



fig.add_shape(
    dict(
        type='line',
        x0=sa_festive_season_end,
        y0=0,
        x1=sa_festive_season_end,
        y1=df_sa['Infection Rate'].max(),
        line=dict(color='red', width=2)  
    )
)

fig.add_annotation(
    dict(
        x=sa_festive_season_end,
        y=df_sa['Infection Rate'].min(),
        text='End of festive season'
    )
)

COVID19 pandemic festive season in Nigeria

In [174]:
Nigeria_season_start_date = '2020-12-15' 
Nigeria_season_a_month_later = '2021-01-05'

let's select the data related to Nigeria

In [175]:
df_Nigeria=df[df.Country=='Nigeria']

let's check the dataframe

In [176]:
df_Nigeria.head()
Out[176]:
Date Country Confirmed Recovered Deaths
46265 2020-02-28 Nigeria 1 0 0
46266 2020-02-29 Nigeria 1 0 0
46267 2020-03-01 Nigeria 1 0 0
46268 2020-03-02 Nigeria 1 0 0
46269 2020-03-03 Nigeria 1 0 0

selecting the needed column

In [177]:
df_Nigeria['Infection Rate']=df_Nigeria.Confirmed.diff()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's check it again

In [178]:
df_Nigeria.head()
Out[178]:
Date Country Confirmed Recovered Deaths Infection Rate
46265 2020-02-28 Nigeria 1 0 0 NaN
46266 2020-02-29 Nigeria 1 0 0 0.0
46267 2020-03-01 Nigeria 1 0 0 0.0
46268 2020-03-02 Nigeria 1 0 0 0.0
46269 2020-03-03 Nigeria 1 0 0 0.0

let's calculate the infection rate in Germany

In [179]:
df_Nigeria['Infection Rate']=df_Germany.Confirmed.diff()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's check the dataframe

In [180]:
df_Nigeria.head()
Out[180]:
Date Country Confirmed Recovered Deaths Infection Rate
46265 2020-02-28 Nigeria 1 0 0 NaN
46266 2020-02-29 Nigeria 1 0 0 0.0
46267 2020-03-01 Nigeria 1 0 0 0.0
46268 2020-03-02 Nigeria 1 0 0 0.0
46269 2020-03-03 Nigeria 1 0 0 0.0

now let's plot the line chart

In [181]:
fig=px.line(df_Nigeria, x='Date', y='Infection Rate', title="Festive Seasnon in Nigeria")
fig.add_shape(
    dict(
        type='line',
        x0=Nigeria_season_start_date,
        y0=0,
        x1=Nigeria_season_start_date,
        y1=df_Nigeria['Infection Rate'].max(),
        line=dict(color='yellow', width=2)  
    )
)
fig.add_annotation(
    dict(
        x=Nigeria_season_start_date,
        y=df_Nigeria['Infection Rate'].max(),
        text='starting festive season'
    )
)

fig.add_shape(
    dict(
        type='line',
        x0=Nigeria_season_a_month_later,
        y0=0,
        x1=Nigeria_season_a_month_later,
        y1=df_Nigeria['Infection Rate'].max(),
        line=dict(color='yellow', width=2)  
    )
)

fig.add_annotation(
    dict(
        x=Nigeria_season_a_month_later,
        y=df_Nigeria['Infection Rate'].min(),
        text='a month later'
    )
)
In [182]:
df_Nigeria['Deaths Rate']=df_Nigeria.Deaths.diff()
df_Nigeria.head()
fig=px.line(df_Nigeria,x='Date',y=['Infection Rate', 'Deaths Rate'])
fig.show()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's do some scaling

In [183]:
df_Nigeria['Infection Rate']=df_Nigeria['Infection Rate']/df_Nigeria['Infection Rate'].max()
df_Nigeria['Deaths Rate']=df_Nigeria['Deaths Rate']/df_Nigeria['Deaths Rate'].max()
C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\maiam\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [184]:
fig=px.line(df_Nigeria, x='Date', y='Infection Rate', title="Before and After Festive Season in Nigeria")
fig.add_shape(
    dict(
        type='line',
        x0=Nigeria_season_start_date,
        y0=0,
        x1=Nigeria_season_start_date,
        y1=df_Nigeria['Infection Rate'].max(),
        line=dict(color='yellow', width=2)  
    )
)
fig.add_annotation(
    dict(
        x=Nigeria_season_start_date,
        y=df_Nigeria['Infection Rate'].max(),
        text='starting date Festive Season'
    )
)

fig.add_shape(
    dict(
        type='line',
        x0=Nigeria_season_a_month_later,
        y0=0,
        x1=Nigeria_season_a_month_later,
        y1=df_Nigeria['Infection Rate'].max(),
        line=dict(color='yellow', width=2)  
    )
)

fig.add_annotation(
    dict(
        x=Nigeria_season_a_month_later,
        y=df_Nigeria['Infection Rate'].min(),
        text='End of festive season'
    )
)